A Stemming Procedure and Stopword List for General French Corpora

نویسنده

  • Jacques Savoy
چکیده

Due to the increasing use of network-based systems, there is a growing interest in access to and search mechanisms for text databases in languages other than English. To adapt searching systems to those foreign languages with characteristics similar to the English language, all we need to do for the most part is to establish a general stopword list and a stemming procedure. This article presents the tools needed to establish these in the French language databases and some retrieval experiments that have been carried out using two mediumsized French language test collections. These experiments were conducted to evaluate the retrieval effectiveness of the propositions described.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Report on CLEF-2001 Experiments: Effective Combined Query-Translation Approach

In our first participation in clef retrieval tasks, the primary objective was to define a general stopword list for various European languages (namely, French, Italian, German and Spanish) and also to suggest simple and efficient stemming procedures for these languages. Our second aim was to suggest a combined approach that could facilitate effective access to multilingual collections. 1 Monoli...

متن کامل

Report on CLEF-2001 Experiments

For our first participation in CLEF retrieval tasks, our first objective was to define a general stopword list for various European languages (namely, French, Italian, German and Spanish) and also to suggest simple and efficient stemming procedures for them. Our second aim was to suggest a combined approach that might be implemented in order to facilitate effective access to multilingual collec...

متن کامل

Data Fusion for Effective European Monolingual Information Retrieval

For our fourth participation in the CLEF evaluation campaigns, our first objective was to propose an effective and general stopword list and a light stemming procedure for the Portuguese language. Our second objective was to obtain a better picture of the relative merit of various search engines when processing documents in the Finnish and Russian languages. Finally, based on the Z-score method...

متن کامل

Corpora Preparation and Stopword List Generation for Arabic data in Social Network

This paper proposes a methodology to prepare corpora in Arabic language from online social network (OSN) and review site for Sentiment Analysis (SA) task. The paper also proposes a methodology for generating a stopword list from the prepared corpora. The aim of the paper is to investigate the effect of removing stopwords on the SA task. The problem is that the stopwords lists generated before w...

متن کامل

Monolingual, Bilingual, and GIRT Information Retrieval at CLEF-2005

For our fifth participation in the CLEF evaluation campaigns, our first objective was to propose an effective and general stopword list as well as a light stemming procedure for the Hungarian, Bulgarian and Portuguese (Brazilian) languages. Our second objective was to obtain a better picture of the relative merit of various search engines when processing documents in those languages. To do so w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIS

دوره 50  شماره 

صفحات  -

تاریخ انتشار 1999